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Abstract. We address the predicate generation problem in the context of loop invariant 
inference. Motivated by the interpolation-based abstraction refinement technique, we ap- 
ply the interpolation theorem to synthesize predicates implicitly implied by program texts. 
Our technique is able to improve the effectiveness and efficiency of the learning-based loop 
invariant inference algorithm of Jung, Kong, Wang and Yi (20f0). We report experimental 
results of examples from Linux, SPEC2000, and the Tar utility. 



One way to prove that an annotated loop satisfies its pre- and post-conditions is by giving 
loop invariants. In an annotated loop, pre- and post-conditions specify intended effects of 
the loop. The actual behavior of the annotated loop however does not necessarily con- 
form to its specification. Through loop invariants, verification tools can check whether the 
annotated loop fulfills its specification automatically [9j. 

Finding loop invariants is tedious and sometimes requires intelligence. Recently, an au- 
tomated technique based on algorithmic learning and predicate abstraction is proposed [14]. 
Given a fixed set of atomic predicates and an annotated loop, the learning-based technique 
can infer a quantifier- free loop invariant over the given atomic predicates. By employing 
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a learning algorithm and a mechanical teacher, the new technique is able to generate loop 
invariants without constructing abstract models nor computing fixed points. 

As in other techniques based on predicate abstraction, the selection of atomic predicates 
is crucial to the eff'ectiveness of the learning-based technique. Oftentimes, users extract 
atomic predicates from program texts heuristically. If this simple strategy does not yield 
necessary atomic predicates to express any loop invariants the loop invariant inference 
algorithm will not be able to infer a loop invariant. Even when the heuristic does give 
necessary atomic predicates, it may select too many redundant predicates and impede the 
efficiency of loop invariant inference algorithm. 

One way to circumvent this problem is to generate atomic predicates by need. Several 
techniques have been developed to synthesize atomic predicates by interpolation [HI [121 EHl 
120] . Let A and B be logic formulae. An interpolant I of ^ and B is a formula such that 
I and I AS is inconsistent. Moreover, the non-logical symbols in / must occur in both 
A and B. By Craig's interpolation theorem, an interpolant / always exists for any first- 
order formulae A and B when ^4 A i? is inconsistent ^ . The interpolant / can be seen as a 
concise summary of A with respect to B. Indeed, many abstraction refinement techniques 
for software model checking [H [lU |T2l [191 EO] have used interpolation to synthesize atomic 
predicates. 

Inspired by the refinement technique in software model checking, we develop an 
interpolation-based technique to synthesize atomic predicates in the context of learning- 
based loop invariant inference. Our algorithm does not add new atomic predicates by 
interpolating invalid execution paths in control fiow graphs. We instead interpolate the loop 
body with purported loop invariants from the learning algorithm. We adopt the existing 
interpolating theorem provers [H [21 [3 [12] for the interpolation. With our new predicate 
generation technique, we can improve the effectiveness and efficiency of the existing learning- 
based loop invariant inference technique Constructing the set of atomic predicates is 
fully automatic and on-demand. 

1.1. Example. Consider the following annotated loop: 

{n>OAx=nAy=n} 
while X > do 

x = X -1; y = y -I 
done 

{x+y=0} 

Assume that variables x and y both have the value n > before entering the loop. The loop 
body decreases each variable by one until the variable x becomes zero. We want to show that 
x + y is zero after executing the loop. This requires of us to establish the fact that variables 
x and y have the same value during iterations and eventually become zero after exiting the 
loop. To express this fact as a loop invariant, we require a predicate x = y. The program 
text however does not reveal this equality explicitly. Moreover, atomic predicates from the 
program text cannot express any loop invariant that establishes the given specification. 
Using atomic predicates in the program text is not sufficient in this case. However, we can 
exploit the fact that any loop invariant l should be weaker than the pre-condition 6 and 
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stronger than the disjunction of the loop guard k and the post-condition e (5 =^ z. =^ k V e). 
Then, we can gen an interpolant from inconsistent formula 5 A V e) and extract atomic 
predicates in it. From the interpolant of(n>OAx = nAy = n)A -i(x >OVx + y = 0),we 
obtain two atomic predicates x = y and 2y > 0. Observe that the interpolation is able to 
synthesize the necessary predicate x = y. In fact, loop invariant x = y A x > establishes 
the specification of the loop. 

1.2. Related Work. Jung et al. introduce the loop invariant inference technique based 
on algorithmic learning. Kong et al. [16] extend this technique to quantified loop invariant 
inference. Both algorithms require users to provide atomic predicates. The present work 
addresses this problem for the case of quantifier- free loop invariants. 

Recently, Lee et al. introduce learning-based technique for termination analysis. 
The technique infers the transition invariant of a given loop as a proof of termination, by 
combining algorithmic learning and decision procedures. In the paper, the authors design a 
heuristic to generate atomic transition predicates. It is an interesting future work to adapt 
our technique in the present paper for transition invariant inference. 

Many interpolation algorithms and their implementations are available [H [21 El I19j . 
Interpolation-based techniques for predicate refinement in software model checking are pro- 
posed in [HI [m [121 [131 120] • Abstract models used in these techniques however may require 
excessive invocations to theorem provers. Another interpolation-based technique for first- 
order invariants is developed in [21] . The paramodulation-based technique presented in the 
paper does not construct abstract models as our approach. It however only generates in- 
variants in first-order logic with equality. A template-based predicate generation technique 
for quantified invariants is proposed [22j. The technique reduces the invariant inference 
problem to constraint programming and generates predicates in user-provided templates. 

1.3. Paper Organization. Section [2] gives preliminaries for the presentation. Section [3] 
reviews the learning-based loop invariant inference framework [T3]. Section [4] presents our 
interpolation-based predicate generation technique. Section [5] presents the loop invariant 
inference algorithms with automatic predicate generation. Section [6| presents and discusses 
our experimental results. Section [7] concludes this work. 

2. Preliminaries 

2.1. Quantifier- free Formulae. Let QF denote the quantifier-free logic with equality, 
linear inequality, and uninterpreted functions. Define the domain D = QuB where Q is the 
set of rational numbers and B = {F, T} is the Boolean domain. Fix a set X of variables. 
A valuation over X is a function from X to D. The class of valuations over X is denoted 
by Valx- For any formula 9 G QF and valuation v over free variables in 6, 6 is satisfied by 
V (written v \= 9) \l 9 evaluates to T under v; 9 is inconsistent if 9 is not satisfied by any 
valuation. Given a formula 9 G QF., a satisfiability modulo theories (SMT) solver returns a 
satisfying valuation of if is not inconsistent p] [7] . 
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2.2. Interpolation Theorem. For 6 G QF, we denote the set of non-logical symbols 
occurred in 6 by a (6). Let © = [^i, . . . , 9m] be a sequence with 9i G QF for 1 < i < m. 
The sequence is inconsistent if A 6*2 A • • • A 0m is inconsistent. The sequence A = 
[Ao, Ai, . . . , Am] of quantifier- free formulae is an inductive interpolant of Q if 

• Ao = T" and Am = F; 

• for all 1 < i < m, Aj-i A =^ A^; and 

• for all 1 < i < m, a{Xi) C a{6i) D a{6i+i). 

The third condition of interpolants makes them attractive to use for predicate generation; 
since the set of symbols in an interpolant should be an intersection of sets of symbols in two 
inconsistent formulae, it sometimes consists of predicates which do not appear in the two. 
The interpolation theorem states that an inductive interpolant exists for any inconsistent 
sequence [6l [191 EQ] . Some of existing theorem provers [D El [3l [19] can generate interpolants 
from inconsistent sequences. 



2.3. Predicate Abstraction. Let QF[P] denote the set of quantifier-free formulae over the 
set P of atomic predicates. A cube over P is a conjunction pi A • • • Ap^ A ^Pk+i A • • • A ^Pk+k' 
where all pj G P are distinct. We say that k + k' is the size of the cube. A minterm over 
P is a cube whose size is |P|. 

Consider the set Bool[Bp\ of Boolean formulae over the set Bp oi Boolean variables 

where Bp = {hp : p G P}. An abstract valuation is a function from Bp to B. We write Valpp 
for the set of abstract valuations. A Boolean formula in Bool[Bp] is a canonical monomial 
if it is a conjunction of literals, where each Boolean variable in Bp occurs exactly once. The 
following functions [HI [15] relate formulae in QF[P] and Bool[Bp\ (Figure [T]): 

7(/3) = P[Bp^P\ 

ot{9) = \J{(i G Bool[Bp] : /? is a canonical monomial and A 7(/3) is satisfiable} 

7*(/^) = A Ma A {-p} 



r 

F ifu^p 



Q*(i/) = 11 where n{bp) — ' ^ if ly \ p 




Figure 1. Relating QF and Bool[Bp] 
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The abstraction function a maps any quantifier-free formula to a Boolean formula in 
Bool\Bp\^ whereas the concretization function 7 maps any Boolean formula in Bool\Bp\ 
to a quantifier- free formula in QF\P\. Moreover, the function a* maps a valuation over X 
to a valuation over Bp\ the function 7* maps a valuation over Bp to a quantifier- free formula 
in QF\P\. The function r(i/) specifies the valuation v in QF . Observe that quantifier- free 
formula 7(/3) is a minterm when Boolean formula /3 is a canonical monomial. Observe also 
that formula 7(0(0)) is in disjunctive normal form and equivalent to G QF\P\. 

Consider, for instance, P = {n > 0, x = n, y = n} and Bp = {bn>o,i>x=n,by=n}- We 
have 7(&n>o A -^bx=n) = n > A -^{x = n) and 

(6„>0 A bx=n A ^by=n) V (6n>0 A ^bx=n A by=n)V 
a(-.(x = y)) = (6„>o A -■6a;=n A ^by=n) V (-'6n>0 A bx=n A -nby=n)^ 
{^bn>0 A -'6x=n A 6y=n) V (-'6n>0 A -'6x=n A ^by=n)- 

Moreover, a*(z^)(5„>o) = a*(j^)(6x=n) = a*(i^)(6j/=n) = T when i/(n) = i^{x) = v{y) = 1. 
And 7*(/i) = n>OAx = nA -i(y = n) when ^{bn>o) = Ai(&x=n) = but fi{by=n) = -P- 

The following lemmas prove useful properties of these abstraction and concretization 
functions. 

Lemma 2.1. Let P be a set of atomic predicates, 6 G QF[P], and (3 a canonical monomial 
in Bool[Bp]. Then 6 A7(/3) is satisfiable if and only if j{f3) =^ 6. 

Proof. Let 9' = \J 9i G QF[P] be a formula in disjunctive normal form such that 9' 9. 

i 

Note that each 9i is a cube over set P. Let Lit{9) be a set of literals in formula 9. Then, 
Lit{9i) CPu{^p:p£P}. 

Assume 9 /\^{I3) is satisfiable. Then 9' A^{f3) is satisfiable and 9iAj{f3) is satisfiable for 
some i. Since /3 is canonical monomial, 7(/3) is a minterm over set P and Lit{9) C Lit{^{/3)). 
Hence 9i A 7(/3) is satisfiable implies 7(/3) ^ 0^. We have 7(/3) ^ 6*. 

The other direction is trivial. □ 

Lemma 2.2. Let P be a set of atomic predicates, 9,p ^ QF[P]. Then 

9 ^ p implies a{9) =^ a{p). 
Proof. Let a{9) = V ft where (3i is a canonical monomial and 9 A is satisfiable. By 

i 

Lemma [2T| 7(/3i) =^ 0. Hence 7(/3j) =^ /) and p A 7(/3i) is satisfiable. □ 
Lemma 2.3. Let P be a set of atomic propositions and 9 £ QF[P]. Then 9 4^ ^{a{9)). 
Proof. Let 9' = /\9i he a quantified- free formula in disjunctive normal form such that 
9' ^9. Let G Bool[Bp]. Define 

x(/x) = l\{{bp ■■ Kbp) =T}U {A : p{bp) = F}). 

Note that x(m) is a canonical monomial and p |= x(^)- 

Assume v \= 9. Then v \= 9i for some i. Consider the canonical monomial xi(x*{^))- 
Note that 1^ |= j{xi<^* i^)))- Thus xi'^*{^)) is a disjunct in a{9). We have v |= 7(0(0)). 

Conversely, assume u |= 7(0(0)). Then \= for some canonical monomial /? and 
7(/3) A is satisfiable. By Lemma 7(/3) =^> 0. Hence 1/ |= 0. □ 
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Lemma 2.4. Let P be a set of atomic propositions, 9 G QF[P], (3 G Bool[Bp\, and v a 
valuation for X . Then 

(1) V \= 6 if and only if a*{v) |= a{9); and 

(2) V 1= 7(/3) if and only if a*{v) |= (3. 



Proof. 

(1) Assume v |= 9. x{^*{^)) is a canonical monomial. Observe that v |= 7(x(q*(z^))). 
Hence "f{x{(y* [u))) f\9 is satisfiable. By the definition of a{9) and is canonical, 

x(a*(i^)) =^ a{9). a*{v) |= q(0) follows from a'iy) |= 

Conversely, assume Oi*{v) \= a{9). Then |= /3 where /3 is a canonical monomial 

and 7(/3) A is satisfiable. By the definition of a*{u), v |= 7(/3). Moreover, 7(/3) 6* 
by Lemma |2.1[ Hence v\= 9. 



(2) Assume |= 7(/3). By Lemma [2^1, a*(i/) |= a(7(/3)). Note that /3 = a(7(/3)). Thus 

□ 

Lemma 2.5. Let P be a set of atomic propositions, 9 G QF[P], and fi a Boolean valuation 
for Bp. Then 7*(/i) ^ 9 if and only if fj, \= a{9). 



Proof Assume 7*(/u) =^ 9. By Lemma 2.2, a{j*{fi)) =^ a{9). Note that 7*(/u) = 7(x(/"))- 
By Lemma [23| x(/") =^ 0(6*). Since /x ^ xif^): have /i |= a{9). 

Conversely, assume /i |= a{9). We have x(/") =^ ck(^) by the definition of x(/")- Let 
1/ 1= 7*(/u), that is, v \= J{x{^^))■ By Lemma [21] (2), a*(i^) |= x(Ai). Since x(m) ^ 0(6*), 



a*(zy) 1= a{9). By Lemma 2.4 (1), i/\=9. Therefore, 7*(/x) ^6*. □ 



2.4. CDNF Learning Algorithm. CDNF algorithm [3] is an exact learning algorithm for 
Boolean formulae based on monotone theory. It infers an unknown target formula by posing 
queries to a teacher. The teacher is responsible for answering two types of queries. The 
learning algorithm may ask if a valuation satisfies the target formula by a membership query. 
Or it may ask if a conjectured formula is equivalent to the target in an equivalence query. 
Using the answers for the queries, CDNF algorithm infers a Boolean formula equivalent 
to the unknown target within a polynomial number of queries in the formula size of the 
target [4J- 

2.5. Programs. We consider the following imperative language in this paper: 

Stmt = nop I Stmt; Stmt | x := Exp | x := nondet | if BExp then Stmt else Stmt 

Exp = n I I Exp + Exp | Exp — Exp 

BExp = F I 2; I -iBExp I BExp A BExp | Exp < Exp | Exp = Exp 

Two basic types are available: natural numbers and Booleans. A term in Exp is a natural 
number; a term in BExp is of Boolean type. The keyword nondet denotes an arbitrary value 
in the type of the assigned variable. An annotated loop is of the form: 

{5} while K do Si; S2; • • • ; Sm done {e} 

The BExp formula k is the loop guard. The BExp formulae 5 and e are the precondition and 
postcondition of the annotated loop respectively. 
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Define X^''^ = : X G X}. For any term e over X, define e^'^^ = e[X i— )■ X^^"^]. A 

transition formula [SJ for a statement 5 is a first-order formula over variables X^^^ U X^^^ 
defined as follows. 



[nopl 


A 






A 




[x := nondet] 


A y<i>=y<°> 






s/ex\{x} 


{x := ej 


A 


x<i) = e<o) A A y^^^ = 




A 


j/ex\{a-} 




3X|5ol[X<i) ^ X] A ^ X] 


if p then 5*0 else Si} 


A 


(pW A [5ol) V A ISil) 



Let and u' be valuations, and S" a statement. We write — > v' if [S"] evaluates 
to true by assigning v{x) and v'{x) to x^*^^ and x^"^^ for each x E X respectively. Given a 

sequence of statements ^i; 5*2; • • • ; S'm, a program execution uq ui ■ ■ ■ Um is a 

sequence [i/q, i^i, • • • , ^^m] of valuations such that i/j — ^ z/j+i for < i < m. 

A precondition Pre{6 : S) for ^ e QF with respect to the statement S, which is a 
first-order formula that entails 9 after executing the statement S, is defined as follows. 



Pre{9 : nop) 



A 



Pre{6 : x := nondet) = \/x.6 

Pre{9 : X := e) = 9[x i— t- e] 

Pre{9 : So; Si) = Pre{Pre{9 : Si) : So) 

Pre{9 : \f p then So else Si) = (p ^ Pre{9 : So)) A {^p ^ Pre{9 : Si)) 

Observe that all universal quantifiers occur positively in Pre{9 : S) for any S. They can be 
eliminated by Skolem constants [TOl [T7] . 



2.6. Problem Definition. Given an annotated loop, 

{6} while K do Si; S2; ■ ■ ■ ; Sm done {e}, 

the loop invariant inference problem is to compute an invariant l € QF that is a formula 
satisfying 

(1) 6 l; 

(2) i A -iK =^ e; and 

(3) i A K Pre{L : Si; ^2; • • • ; Sm)- 

Observe that the condition (2) is equivalent to i =^ e V k. The first two conditions specify 
necessary and sufficient conditions of any loop invariants respectively. The formulae 5 and 
e V K are called the strongest and weakest approximations to loop invariants respectively. 



7 



learning 
algorithm 



MEM in) ^ 



YES, NO 




mechanical 
teacher 



program 
text 



YES,£, 



Figure 2. Learning-based Framework 



We are particularly interested in the following variant of the loop invariant inference 
problem: 

(a) Given a set P of atomic predicates, finding an invariant l G QF[P\; and 

(b) Given an annotated loop, finding a suitable set P of atomic predicates that contains 
enough predicates to express at least one of the invariants. 

Jung et al. propose a algorithmic-learning-based technique that solves the part (a) 
of the problem. The technique combines predicate abstraction and decision procedures to 
make a mechanical teacher that answers the queries from learning algorithm. With predi- 
cate abstraction, the learning algorithm becomes an efficient engine for exploring possible 
combinations of predicates to find an invariant. 

In this paper, we address the part (b) of the problem using interpolation. As already 
stated in Section 2.2 interpolation provides a systematic method for predicate generation 



and widely adopted in software model checking. We explain the application of interpolation 
in the context of learning-based loop invariant inference. 



3. Inferring Loop Invariants with Algorithmic Learning 

In this section, we review the learning-based framework for inferring quantifier-free loop 
invariant due to Jung et al. |14] . Given a set P of atomic predicates, the authors show how 
to apply a learning algorithm for Boolean formulae to infer quantifier-free loop invariants 
freely generated by P. They first adopt predicate abstraction to relate quantifier-free and 
Boolean formulae. They then design a mechanical teacher to guide the learning algorithm to 
a Boolean formula whose concretization is a loop invariant. We first explain the algorithms 
for resolving queries from the learning algorithm and then the main loop of learning-based 
loop invariant inference. 

3.1. Answering Queries from Algorithmic Learning. Figure [2] shows a high-level view 
of learning-based loop invariant inference framework. In the framework, a learning algo- 
rithm is used to drive the search of loop invariants. It "learns" an unknown loop invariant 
by inquiring a mechanical teacher. The mechanical teacher of course does not know any 
loop invariant. It nevertheless tries to answer these queries by the information derived from 
program texts. In this case, the teacher uses approximations to loop invariants. By employ- 
ing a learning algorithm, it suffices to design a mechanical teacher to find loop invariants. 
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Moreover, the new framework does not construct abstract models nor compute fixed points. 
It can be more scalable than traditional techniques. 

After formulae in QF and valuations in Valx are abstracted to those in Bool[Bp\ and 
ValBp respectively, a learning algorithm is used to infer abstractions of loop invariants. Let 
^ be an unknown target Boolean formula in Bool[Bp\. A learning algorithm computes a 
representation of the target E, by interacting with a teacher. The teacher should answer the 
following queries 

• Membership queries. Let /i € Val^p be an abstract valuation. The membership query 
MEM{^) asks if the unknown target ^ is satisfied by ^u. If so, the teacher answers YES; 
otherwise, NO. 

• Equivalence queries. Let /3 G Bool[Bp] be an abstract conjecture. The equivalence query 
EQ[j3) asks if /3 is equivalent to the unknown target ^. If so, the teacher answers YES. 
Otherwise, the teacher gives an abstract valuation /i such that the exclusive disjunction 
of (3 and ^ is satisfied by ji. The abstract valuation /i is called an abstract counterexample. 

With predicate abstraction and a learning algorithm for Boolean formulae at hand, it 
remains to design a mechanical teacher to guide the learning algorithm to the abstraction 
of a loop invariant. The key idea in |14j is to exploit approximations to loop invariants. An 
under- approximation to loop invariants is a quantifier-free formula l which is stronger than 
some loop invariants of the given annotated loop; an over- approximation is a quantifier- free 
formula I which is weaker than some loop invariants. 

In the following, we explain exactly how we can answer queries from learning algorithm 
using under- and over-approximation of loop invariant. 



3.1.1. Answering Membership Queries. In the membership query MEM{fj,), the teacher is 
required to answer whether // |= a(^). We concretize the Boolean valuation /x and check 
it against the approximations. If the concretization 7*(/i) is inconsistent (that is, 7*(//) is 
unsatisfiable) , we simply answer NO for the membership query. Otherwise, there are three 
cases: 

(1) 

7*(/u) 



(2) 
(3) 



i. Thus fj, \= a{i) (Lemma 2.5). And fi 
^ I. 



a{i) by Lemma 2.2 
-ia(i). Since 



we have 



^ y= a{L) by Lemma 2.2 



Thus fx ^ a(i) (Lemma 2.5). That is, fi 

a{i) by the approximations. In this case. 



Otherwise, we cannot determine whether /i 
we answer YES or NO randomly. 



/* L,L : under- and over-approximations to loop invariants */ 
Input: a membership query MEM{fi) with fi E Valsp 
Output: YES or NO 

if 9 is inconsistent then return NO; 
ii ^ L then return YES; 
if u \= -'{6 ^ l) then return NO; 
return YES or NO randomly; 

Algorithm 1: Membership Query Resolution 
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Algorithm [T] shows our membership query resolution algorithm. Note that instead of 
giving a random answer when a membership query cannot be resolved by given invariant ap- 
proximations, one can give more accurate answer by exploiting better approximations from 
static analyzers. This learning-based framework is orthogonal to existing static analysis 
techniques [H]. 



3.1.2. Answering Equivalence Queries. To answer the equivalence query EQ{l3), we con- 
cretize the Boolean formula (3 and check if 7(,5) is indeed an invariant of the while state- 
ment for the given pre- and post-conditions. If it is, we are done. Otherwise, we use an 
SMT solver to find a witness to a(^) © f3. There are three cases: 

(1) There is a such that z/ |= =^ 7(/3)). Then |= i A -i7(/3). By Lemma 2.4 and 2.2 
we have a*{u) |= a(i) and a*{i') |= -1/3. Thus, a*{u) |= a(^) A ^p. 

(2) There is a such that u |= ~'(7(/3) =^ l). Then \= 7(/3) A -iZ. By Lemma 2.4 
a*{iy) \= f3. a*{h') \= ^a{i) by Lemma [2l4l and [2^ Hence a*{u) |= /3 A 



(3) Otherwise, we cannot find a witness to a(^) 
we give a random abstract counterexample. 



_ MO- 

(3 by the approximations. In this case, 



/* {5} while K do Si;S2',--- ', Sm done {e} : an annotated loop */ 
/* L,L : under- and over-approximations to loop invariants */ 

Input: an equivalence query EQ{13) with f3 £ Bool[Bp] 
Output: YES or an abstract counterexample 

9 := 7(/3); 

ii 6 ^ 9 and 9 ^ eV k and 9 Ak^ Pre{9 : 5i; 5*2; • • • ; Sm) then return YES; 
if u 1= -.(i =^9) or V 1= ^{9 =^ l) or v \= ^{9 A k Pre{l : Si; S2; • • • ; Sm)) then 

return a*{v); 
return a random abstract counterexample; 

Algorithm 2: Equivalence Query Resolution 

Algorithm [2] shows our equivalence query resolution algorithm. Note that Algorithm [2] 
returns YES only if an invariant is found. 

As in the membership query resolution, we give a random answer when an equivalence 
query is not resolved by given invariant approximations. We can still refine approximations 
using some static analysis to give more accurate counterexample. 



/* {5} while K do Si;S2;--- ; Sm done {e} : an annotated loop */ 

Output: a loop invariant for the annotated loop 

t := 5 Ve; 

Z := e V k; 

repeat 

call a learning algorithm for Boolean formulae where membership and 

equivalence queries are resolved by Algorithms [T] and [2] respectively; 
until a loop invariant is found ; 

Algorithm 3: Main Loop 
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3.2. Main Loop of of Inference Framework. The main loop of loop invariant inference 
algorithm is given in Algorithm |3j We heuristically choose 5 V e and e V k as the under- and 
over-approximations respectively. Note that the under-approximation would be stronger if 
one uses the strongest approximation S. It is, however, reported that the weaker approxi- 
mation 5 V e for the under-approximation is more effective in resolving queries [H]. After 
determining the approximations, a learning algorithm is used to find an invariant. In |14j . 
Jung et al. use CDNF algorithm with Algorithms [T] and [2] for resolving queries. 

Note that the mechanical teacher may give conflicting answers. Random answers to 
membership queries may contradict abstract counterexamples from equivalence queries. 
Moreover, different valuations may correspond to the same abstract valuation. The learning 
algorithm cannot infer any loop invariant in the presence of conflicting answers. When the 
mechanical teacher gives conflicting answers, we restart the learning algorithm and search 
another loop invariant. In practice, there are nevertheless sufficiently many invariants for an 
annotated loop. The learning-based technique can infer a loop invariant without incurring 
any conflicts after a small number of restarts. As an empirical evidence, observe the number 
of restarts in Table [l} Even without the new predicate generation technique, the numbers of 
restarts in all but three examples are less than three. The number of restarts is dramatically 
improved with the new technique since the technique generates predicates incrementally on 
demand so that it can make the abstraction parsimonious. 

We remark that the learning-based loop invariant inference is semi-algorithm; Algo- 
rithm |3] terminates with a loop invariant only when there exists one for the loop that can 
be expressed with the given set of predicates. If there are not enough atomic predicates to 
express any invariant, the algorithm will iterate indefinitely. For example, tar example in 
Section [6] timed out because it turned out to have no invariant with only atomic predicates 
from the program text. 



4. Predicate Generation by Interpolation 

One drawback in the learning-based approach to loop invariant inference is to require a 
set of atomic predicates. It is essential that at least one quantifier-free loop invariant is 
representable by the given set P of atomic predicates. Otherwise, concretization of formulae 
in Bool[Bp\ cannot be loop invariants. The mechanical teacher never answers YES to 
equivalence queries. To address this problem, we will synthesize new atomic predicates for 
the learning-based loop invariant inference framework progressively. 

The interpolation is essential to our predicate generation technique. Let O = [6*1, 02, • • • , ^ 
be an inconsistent sequence of quantifier- free formula and A = [Aq, Ai, A2, . • . , Am] its in- 
ductive interpolant. By definition, 9i ^ \i. Assume 61 A 62 A ■ ■ ■ A 9i =^ Aj. We have 
01 A 6*2 A • • • A 0,;+i =^ Ai+i since Aj A 0i+i =^ Aj+i. Thus, Aj is an over-approximation to 
01 A 6*2 A • • • A 0i for < i < m. Moreover, a{Xi) C a{9i) fl (7(0i+i). Hence Aj can be seen as 
a concise summary of 0i A 02 A • • • A 0^ with restricted symbols. Since each Aj is written in a 
less expressive vocabulary, new atomic predicates among variables can be synthesized. We 
therefore apply the interpolation theorem to synthesize new atomic predicates and refine 
the abstraction. 

Our predicate generation technique consists of three components. Before the learning 
algorithm is invoked, an initial set of atomic predicates is computed (Section |4.1[ ). When the 
learning algorithm is failing to infer loop invariants, new atomic predicates are generated to 



refine the abstraction (Section 4.2). Lastly, conflicting answers to queries may incur from 
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predicate abstraction. We further refine the abstraction with these conflicting answers (Sec- 
tion 4.3). Throughout this section, we consider the annotated loop {5} while k do ^i; ^2; 
• • • ; Sm done {e} with the under-approximation l and over- approximation 1. 



4.1. Initial Atomic Predicates. The under- and over-approximations to loop invariants 
must satisfy l ^ I. Otherwise, there cannot be any loop invariant l such that l ^ l and 
L ^ 1. Thus, the sequence [t, -iZ] is inconsistent. For any interpolant [T,X,F] of ~'Z], 
we have t =^ A and A =^ I. The quantifier-free formula A can be a loop invariant if it 
satisfies A A k =^ Pre (A : Si; S2; - • • ; Sm)- It is however unlikely that A happens to be a loop 
invariant. Yet our loop invariant inference algorithm can generalize A by taking the atomic 
predicates in A as the initial atomic predicates. The learning algorithm will try to infer a 
loop invariant freely generated by these atomic predicates. 



4.2. Atomic Predicates from Incorrect Conjectures. Consider an equivalence query 
EQ{f3) where /3 G Bool[Bp] is an abstract conjecture. If the concretization 9 = 7(/3) is 
not a loop invariant, we interpolate the loop body with the incorrect conjecture 9. For any 
quantifier-free formula 9 over variables U X^'^\ define 9^''^ = ^ X^''\X^^^ ^ 

The desuper scripted form of a quantifier-free formula A over variables X^^^ is 
A[X<^> ^ X]. Moreover, if is a valuation over X^^^ U • • • U X^'^\ ^ixi*') represents a 
valuation over X such that i^i^(fc) (x) = v{x'^^'') for x G X. Let and ijj be quantifier-free 
formulae over X. Define the following sequence: 

Observe that 

• (/><°> and [S'll^o^ share the variables X<o>; 

• and ^V'^"'^ share the variables X^"^^; and 

• and [Si+if^ share the variables X<*> for 1 < i < m. 
Starting from the program states satisfying cj)'^^^ , the formula 

A A I52l<^U • • • A I5il<*-1> 

characterizes the images of <p^^^ during the execution of ^i; 52; • • • ; Si. 

Lemma 4.1. Let X denote the set of variables in the statement Si;S2',-- - ;5j, and (j) a 
quantifier-free formula over X . For any valuation v over XWuX<i>U---UX<^>, the formula 

(/><o) A |5i|W A |52l<i> A • • • A [5J<*-i) is satisfied by v if and only if Hxio)^ HxW^ 

■ ■ ■ — ^ is a program execution and \= <p. 

Proof. By induction on the length of statement 5i; 52; • • • ; Si. Suppose that the lemma is 

'3 ' 1 

true for statement Si; S2; ■ ■ ■ ; Si. By definition of program execution, if i'ix(^) ~^ ^ix<*+i) ' 
then u satisfies [5i+i]^*^ and vice versa. By induction hypothesis, the formula (f)^^^ A [5i|^°^ A 
[52!^^^ A • • • A [5j-|-i]^'^ is satisfied by and the statement follows by it. □ 
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By definition, =^ Pre{ip : ^i; S2; • • • ; Sm) implies that the image of must satisfy 
^p after the execution of Si; S2] • • • ; Sm- The sequence ^i, . . . , Sm, V') is inconsistent if 
(j) =^ Pre{ip : Si; S2; - ■ ■ ; Sm)- The following proposition will be handy. 

Proposition 4.2. Let Si;S2;--- ; Sm be a sequence of statements. For any cp with cp =^ 
Pre{ip : Si; S2; • • • ; Sm), 5i, . . . , Sm, "0) has an inductive interpolant. 

Proof. By induction on the length of statement Si;S2;--- ;Sm- Suppose the proposition 
holds for statement 5*2; • • • ; Sm and an arbitrary formula (j) with =^ Pre{ip : Si; S2; • • • ; Sm)- 
By definition of Pre, Pre{tp : Si;S2;--- ; Sm) = Pre{Pre{ip : /S'2; • • • ', Sm) ■ Si) Let cp' 
be a formula such that cp satisfies cp' after execution of Si. By induction hypothesis, 
'^{(p' , S2, . ■ . , Sm,tp) has an inductive interpolant. Thus, E{(p, Si, . . . , Sm,''P) also has in- 
ductive interpolant. 

□ 

Let A = [T, Ai, A2, . . . , Am+i, ^] be an inductive interpolant of E{(p, Si, . . . , Sm,ip)- 
Recall that Aj is a quantifier-free formula over for 1 < i < m + 1. It is also an over- 



approximation to the image of cp after executing 5i; ^2; • • • ; Si-i. Proposition 4.2 can be 
used to generate new atomic predicates. One simply finds a pair of quantifier- free formulae 
(p and V' with cp =^ Pre{'4j : 5i; 52; • • • ; Sm), applies the interpolation theorem, and collects 
desuperscripted atomic predicates in an inductive interpolant of E{(p, Si, . . . , Sm, ip)- In the 
following, we show how to obtain such pairs with under- and over-approximations to loop 
invariants. 

4.2.1. Interpolating Over-Approximation. It is not hard to see that an over-approximation 
to loop invariants characterizes loop invariants after the execution of the loop body. Recall 
that t =^ Z for some loop invariant i. Moreover, t A k =^ Pre{i : Si; S2; ■ ■ ■ ; Sm)- By the 
monotonicity of Pre{» : Si; S2; ■ • • ; Sm), we have i A k =^ Pre{Z : ^i; 5*2; • • • ; Sm)- 

Proposition 4.3. Let 1 be an over- approximation to loop invariants of the annotated loop 
{5} while K do 5i; S'2; • • • ; Sm done {e}. For any loop invariant l with i ^ I, i f\ k ^ 
Pre{l : Si; S'2; • • • ; Sm)- 

Proof. Since t is a loop invariant, l A k ^ Pre{i : S). The statement follows by the 
monotonicity of Pre{» : S). □ 



Proposition |4.3| gives a necessary condition to loop invariants of interest. Recall that 9 = 
7(/3) is an incorrect conjecture of loop invariants. If v |= -^{9Ak =^ Pre(i : Si; S2; • • • ; Sm)), 
the mechanical teacher returns the abstract counterexample a*(i^). Otherwise, Proposi- 



tion 4.2 is applicable with the pair 9 A k and l. 



Corollary 4.4. Let I be an over- approximation to loop invariants of the annotated loop 
{6} while K do Si; S2; • • • ; Sm done {e}. For any 9 with 9 A n ^ Preil : Si; S2; • • • ; Sm), 
the sequence 3(9 A k. Si, S2, . . . , Sm,T^) has an inductive interpolant. 



Proof. By Proposition 4.2 □ 
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4.2.2. Interpolating Under- Approximation. For under-approximations, there is no necessary 



condition. Nevertheless, Proposition 4.2 is appHcable with the pair l/\k and 



Corollary 4.5. Let l be an under- approximation to loop invariants of the annotated loop 
{6} while K do 5i; S2; • • • ; Sm done {e}. For any 9 with lAk^ Pre{6 : Si; 82] • • • ; Sm), 
the sequence ^(t A k, Si, S2, • • . , Sm, 0) has an inductive interpolant. 



Proof. By Proposition 4.2 O 



Generating atomic predicates from an incorrect conjectme 9 should now be clear (Al- 
gorithm [4]). Assuming that the incorrect conjecture satisfies the necessary condition in 
Proposition |4.3[ we simply collect all desuperscripted atomic predicates in an inductive 



interpolant of E{9 A k, Si, S2, ■ ■ ■ , Sm, (Corollary 4.4). More atomic predicates can be 



obtained from an inductive interpolant of H(i A k. Si, S2, . • • , Sm, 9) if additionally lAk 



Pre{9 : Si; ^2; • • • ; Sm) (Corollary 4.5) 



/* {5} while K do Si;---;Sm done {e} : an annotated loop */ 
/* i,l : under- and over-approximations to loop invariants */ 

Input: a formula 9 G QF[P] such that 9 A k ^ Preil : 5*1; 5*2; • • • ; Sm) 
Output: a set of atomic predicates 

/ := an inductive interpolant of 'E{9 A k, Si, S2, ■ ■ ■ , Sm, ^); 
Q := desuperscripted atomic predicates in I; 
if i A K ^ Pre{9 : Si; S2; • • • ; Sm) then 

J := an inductive interpolants of E{l A k,Si, S2, ■ ■ ■ , Sm, 9); 

R := desuperscripted atomic predicates in J; 

Q:=QVJR; 
end 

return Q 

Algorithm 4: PredicatesFroinConjecture(0) 



4.3. Atomic Predicates from Conflicting Abstract Counterexamples. Because of 
the abstraction, conflicting abstract counterexamples may be given to the learning algo- 
rithm. Consider the example in Section [T} Recall that n>0Ax = nA7/ = n and 
a; + 7/ = 0Vx>0 are the under- and over-approximations respectively. Suppose there is 
only one atomic predicate y = 0. The learning algorithm tries to infer a Boolean formula 
A G i?oo/[6y=o]- Let us resolve the equivalence queries EQ{T) and EQ{F). On the equiv- 
alence query EQ{F), we check if F is weaker than the under-approximation by an SMT 
solver. It is not, and the SMT solver gives the valuation ^^{n) = i'q{x) = VQ{y) = 1 as a 
witness. Applying the abstraction function a* to fQ, the mechanical teacher returns the 
abstract counterexample hy=Q 1— t- F. The abstract counterexample is intended to notify that 
the target formula A and F have different truth values when hy=Q is F. That is, A is satisfied 
by the valuation hy=Q 1— t- F. 

On the equivalence query EQ{T), the mechanical teacher checks if T is stronger than 
the over-approximation. It is not, and the SMT solver now returns the valuation 1^1 (x) = 
0,z^i(y) = 1 as a witness. The mechanical teacher in turn computes by=o 1— t- F as the 
corresponding abstract counterexample. The abstract counterexample notifies that the 
target formula A and T have different truth values when by=o is F. That is, A is not 
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satisfied by the valuation by=Q i— )• F. Yet the target formula A cannot be satisfied and 
unsatisfied by the valuation by=Q i— )• F. We have conflicting abstract counterexamples. 

Such conflicting abstract counterexamples arise because the abstraction is too coarse. 
This gives us another chance to refine the abstraction. For distinct valuations z/ and i^' , 
r(z/) AT{i'') is inconsistent. For instance, T{i>q) = (n = 1) A (x = 1) A (y = 1), r(z^i) = {x = 
0) (y = 1)) fiiid r(z^i) A r(z/o) is inconsistent. 



/* {5} while K do Si;S2',--- Sm done {e} : an annotated loop */ 
Input: distinct valuations v and i'' such that ct*{i') = Oi*{i'') 
Output: a set of atomic predicates 
X := r(z.); 
X' := T{u'y, 

/* X A X' is inconsistent */ 

Q := atomic predicates in an inductive interpolant of [X,X' V ^p]; 
return Q; 

Algorithm 5: PredicatesFromConf lict(i^, z^') 



Algorithm[5]generates atomic predicates from conflicting abstract counterexamples. Let 
u and 1^' be distinct valuations in Valx- We compute formulae X = T{v) and X' = T{v'). 
Since v and u' are conflicting, they correspond to the same abstract valuation a*{v) = 
a*{u'). Let p = 7*(a*(zv)). We have X ^ p aivd X' ^ p [14J. Recall that X A X' \s 
inconsistent. [X,X' V -ip] is also inconsistent for X ^ p. Algorithm [s] returns atomic 
predicates in an inductive interpolant of [X,X' V ^p]. 



5. Loop Invariant Inference Algorithms with Predicate Generation 

Algorithm [6] is the main loop of inference framework with predicate generation. The algo- 
rithm is the same as Algorithm |3] except the gray-boxed parts. 

We first compute the initial set of atomic predicates by interpolating l and -il (Sec- 
tion 4.1 ). With the initial set, we start the learning process until the algorithm finds a loop 
invariant or there is an exception raised. Exceptions basically mean that the current set of 
predicates might not be enough to find a loop invariant. We need in this case to find more 
predicates using one of the algorithms explained in Section |4j 

The learning algorithm finds conflicting abstract counterexamples when the equivalence 
query resolution algorithm gives a random counterexample that contradicts the previous 
ones or the current predicate abstraction is too coarse. Since we cannot distinguish the 
two, we always generate more predicates using Algorithm [5| hoping that we can find a loop 
invariant in the next iteration. 

The ExcessiveRandomAnswers exception is raised when our new equivalence query 
resolution algorithm, which is detailed later, suspects that it generates too many random 
counterexamples because of the coarse predicate abstraction. In this case, we generate more 
predicates using Algorithm [4} 

Note that we start the learning algorithm from the scratch every time we generate more 
predicates. The reason is because we use CDNF algorithm for learning that handles only 
a fixed number of Boolean variables. Recently, Chen et al. [5] propose a variant of CDNF 
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/* CEX : a set of counterexamples */ 
/* T : a threshold to generate new atomic predicates */ 
/* {5} while K do Si;S2',-- - ', Sm done {e} : an annotated loop */ 
Output: a loop invariant for the annotated loop 
t := 5 Ve; 
i := e V k; 

jP := InitialAtomicPredicatesO 
repeat 
Jtry 

call a learning algorithm for Boolean formulae where membership and 
equivalence queries are resolved by Algorithms [I] and [t] respectively; 
catch Conf lictAbstractCEX — ^ 

^ find distinct valuations i' and v' in CEX such that = a*(i^'); 

I P := P U PredicatesFromConf lict(z^, i^'); 
catch ExcessiveRandomAnswers(0) — )■ 
^ P := P U PredicatesFromConjecture(0); 

'r:= [1.3^; 
until a loop invariant is found ; 

Algorithm 6: Main Loop with Predicate Generation 



algorithm that supports incremental learning. We can also adopt this algorithm to improve 
the efficiency of the overall technique. 

/* CEX : a set of counterexamples */ 
/* r : a threshold to generate new atomic predicates */ 
/* {5} while K do Si;S2',--- ', Sm done {e} : an annotated loop */ 
/* L,l : under- and over-approximations to loop invariants */ 

Input: an equivalence query EQ{I3) with f3 G Bool[Bp] 
Output: YES or an abstract counterexample 
9 := ^{13)- 

ii 5 ^ 9 and 9 ^ eV k and 9 Ak^ Pre{9 : Sr, S2; ■ ■ ■ ; Sm) then return YES; 
if u 1= -^{l ^9) or u 1= ^{9 =^ l) or u \= ^{9 A k Pre{l : Si; S2; ■ ■ ■ ; Sm)) then 





CEX := CEX\j{i^}; 


return a*{iy); 


if the number of random abstract counterexamples < r then 



return a random abstract counterexample; 
else 



throw Exce s s i veRandomAnswer s (9) ; 

Algorithm 7: Equivalence Query Resolution with Predicate Generation 

The equivalence query resolution algorithm is given in Algorithm[7j Again, we put gray- 
boxes to denote the modified parts. As Algorithm [2j the mechanical teacher first checks 
if the concretization of the abstract conjecture is a loop invariant. If so, it returns YES 
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Table 1. Experimental Results. 

P : # of atomic predicates, MEM : # of membership queries, EQ : # of equivalence 
queries, RE : # of the learning algorithm restarts, T : total elapsed time (s). 



case 


SIZE 


Previous im 


Current 


BLAST |20] 


P 


MEM 


EQ 


RE 


T 


P 


MEM 


EQ 


RE 


T 


P 


T 


ide-ide-tape 


16 


6 


13 


7 


1 


0.05 


4 


6 


5 


1 


0.05 


21 


1.31(1.07) 


ide-wait-ireason 


9 


5 


790 


445 


33 


1.51 


5 


122 


91 


7 


1.09 


9 


0.19(0.14) 


parser 


37 


17 


4,223 


616 


13 


13.45 


9 


86 


32 


1 


0.46 


8 


0.74(0.49) 


riva 


82 


20 


59 


11 


2 


0.51 


7 


14 


5 


1 


0.37 


12 


1.50(1.17) 


tar 


7 


6 


oo 


oo 


oo 


oo 


2 


2 


5 


1 


0.02 


10 


0.20(0.17) 


usb-message 


18 


10 


21 


7 


1 


0.10 


3 


7 


6 


1 


0.04 


4 


0.18(0.14) 


vpr 


8 


5 


16 


9 


2 


0.05 


1 


1 


3 


1 


0.01 


4 


0.13(0.10) 



and concludes the loop invariant inference algorithm. Otherwise, the mechanical teacher 
compares the concretization of the abstract conjecture with approximations to loop in- 
variants. If the concretization is stronger than the under-approximation, weaker than the 



over-approximation, or it does not satisfy the necessary condition given in Proposition 4.3 



an abstract counterexample is returned after recording the witness valuation [TH |T6] . The 
witnessing valuations are needed to synthesize atomic predicates in Algorithm [6] when con- 
flicts occur. 

If the concretization is not a loop invariant and falls between both approximations to 
loop invariants, there are two possibilities. The current set of atomic predicates is sufficient 
to express a loop invariant; the learning algorithm just needs a few more iterations to infer 
a solution. Or, the current atomic predicates are insufficient to express any loop invariant; 
the learning algorithm cannot derive a solution with these predicates. Since we cannot tell 
which scenario arises, a threshold is deployed heuristically. If the number of random ab- 
stract counterexamples is less than the threshold, we give the learning algorithm more time 
to find a loop invariant. Only when the number of random abstract counterexamples exceeds 
the threshold, can we synthesize more atomic predicates for abstraction refinement. Intu- 
itively, the current atomic predicates are likely to be insufficient if lots of random abstract 
counterexamples have been generated. In this case, we raise ExcessiveRandomAnswers ex- 
ception to synthesize more atomic predicates from the incorrect conjecture in Algorithm |6| 
Observe that in Algorithm [oj threshold r is set to [l.S'^l], the approximate size of the 
search space, which we found empirically. 



6. Experimental Results 

We have implemented the proposed technique in OCaml. In our implementation, the SMT 
solver YiCES and the interpolating theorem prover CSISAT |T] are used for query resolu- 
tion and interpolation respectively. In addition to the examples in [14], we add two more 
examples: riva is the largest loop expressible in our simple language from Linux[^ and tar 
is extracted from Taij^ All examples are translated into annotated loops manually. Data 
are the average of 100 runs and collected on a 2.4GIIz Intel Core2 Quad CPU with 8GB 
memory running Linux 2.6.31 (Table [l]). 

^In Linux 2.6.30 drivers/video/riva/rivaJhw. c :nvlOCalcArbitration() 
^In Tar 1.13 src/mangle . c : extract _mangle() 
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{ size = M A copy = N } 

1 while size > do 

2 available := nondet; 

3 if available > size then 

4 copy := copy + available; 

5 size := size — available; 

6 done 

{ size = =^ copy = M + N } 
Figure 3. A Sample Loop in Tar 

In the table, the column Previous represents the work in [Ij] where atomic predicates 
are chosen hem'istically. Specifically, all atomic predicates in pre- and post-conditions, loop 
guards, and conditions of if statements are selected. The column Current gives the 
results for our automatic predicate generation technique. Interestingly, heuristically chosen 
atomic predicates suffice to infer loop invariants for all examples except tar. For the tar 
example, the learning-based loop invariant inference algorithm fails to find a loop invariant 
due to ill-chosen atomic predicates. In contrast, our new algorithm is able to infer a loop 
invariant for the tar example in 0.02s. The number of atomic predicates can be significantly 
reduced as well. Thanks to a smaller number of atomic predicates, loop invariant inference 
becomes more economical in these examples. Without predicate generation, four of the six 
examples take more than one second. Only one of these examples takes more than one 
second using the new technique. Particularly, the parser example is improved in orders of 
magnitude. 

The column BLAST gives the results of lazy abstraction technique with interpolants 
implemented in BLAST [20]. In addition to the total elapsed time, we also show the prepro- 
cessing time in parentheses. Since the learning-based framework does not construct abstract 
models, our new technique outperforms BLAST in all cases but one (ide-wait-ireason). 
If we disregard the time for preprocessing in BLAST, the learning-based technique still wins 
three cases (ide-ide-tape, tar, vpr) and ties one (usb-message). Also note that the num- 
ber of atomic predicates generated by the new technique is always smaller except parser. 
Given the simplicity of the learning-based framework, our preliminary experimental results 
suggest a promising outlook for further optimizations. 

6.1. tar from Tar. This simple fragment is excerpted from the code for copying two 
buffers. M items in the source buffer are copied to the target buffer that already has 
items. The variable size keeps the number of remaining items in the source buffer and 
copy denotes the number of items in the target buffer after the last copy. In each iteration, 
an arbitrary number of items are copied and the values of size and copy are updated 
accordingly. 

Observe that the atomic predicates in the program text cannot express any loop in- 
variant that proves the specification. However, our new algorithm successfully finds the 
following loop invariant in this example: 

M + N < copy + size A copy + size < M + N 
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The loop invariant asserts that the number of items in both buffers is equal to M + A^. 
It requires atomic predicates unavailable from the program text. Predicate generation is 
essential to find loop invariants for such tricky loops. 

6.2. parser from SPEC2000 Benchmarks. For the parser example (Figure|4]), 9 atomic 
predicates are generated. These atomic predicates are a subset of the 17 atomic predicates 
from the program text. Every loop invariant found by the loop invariant inference algorithm 
contains all 9 atomic predicates. This suggests that there are no redundant predicates. Few 
atomic predicates make loop invariants easier to comprehend. For instance, the following 
loop invariant summarizes the condition when success or give^up is true: 

{success V give. up) =^ 

(valid 7^ V cutoff = maxcost V words < count) A 

{-^search V valid 7^ V words < count) A 

{linkages = canonical A linkages > valid A linkages < 5000) 

The invariant is simpler and thus easier to understand than the one presented in |14j . 
The right side of the implication summarizes the condition when success or givc-up becomes 
true. 

Fewer atomic predicates also lead to a smaller standard deviation of the execution 
time. The execution time now ranges from 0.36s to 0.58s with the standard deviation 

{ phase = F A success = F A givc-up = F A cutoff = A count = } 
1 while -^{success V give.up) do 



2 entered -phase := F; 

3 if ^phase then 

4 if cutoff = then cutoff := 1; 

5 else if cutoff = 1 A maxcost > 1 then cutoff := maxcost; 

6 else phase := T; entered .phase := T; cutoff := 1000; 

7 if cutoff = maxcost A ^search then give_up := T; 

8 else 

9 count := count + 1; 

10 if count > words then give.up := T; 

11 if entered-phase then count := 1; 

12 linkages := nondet; 

13 if linkages > 5000 then linkages := 5000; 

14 canonical := 0; valid := 0; 

15 if linkages ^ then 

16 valid := nondet; 

17 assume < valid A valid < linkages; 

18 canonical := linkages; 

19 if valid > then success := T; 



20 done 

{ (valid > V count > words V {cutoff = maxcost A ^search)) A 
valid < linkages A canonical = linkages A linkages < 5000 } 

Figure 4. A Sample Loop in SPEC2000 Benchmark PARSER 
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equal to 0.06. In contrast, the execution time for [14] ranges from 1.20s to 80.20s with the 
standard deviation equal to 14.09. By Chebyshev's inequality, the new algorithm infers a 
loop invariant in one second with probability greater than 0.988. With a compact set of 
atomic predicates, loop invariant inference algorithm performs rather predictably. 



{ retries = 100 A {^ireason.has.ATAPI _COD V ireason.has.ATAPIJO) } 

1 while retries /OA {^ireason_has.ATAPI _COD V ireason_has.ATAPI JO) do 

2 retries := retries — 1; 

3 ireason_has-ATAPI _COD := nondet; 

4 ireason_has-ATAPI _I0 := nondet; 

5 if retries = then 

6 ireason_has.ATAPI_COD := T; 

7 ireason_has -ATAPI _I0 := F; 

8 done 

{ retries < 100 A ireason_has.ATAPI _COD A ^ireason_has .ATAPI JO } 
Figure 5. A Sample Loop in Linux IDE Driver 

6.3. ide-wait-ireason from Linux Device Driver. In the ide-wait-ireason example 
(Figure [5]), predicate generation performs better even though it generates the same number 
of atomic predicates. This is because the technique can synthesize the atomic predicate 
retries < 100 which does not appear in the program text but is essential to loop invariants. 
Surely this atomic predicate is expressible by the two atomic predicates retries = 100 and 
retries < 100 from the program text. However the search space is significantly reduced with 
the more succinct atomic predicate retries < 100. Subsequently, the learning algorithm only 
needs a quarter of queries to infer a loop invariant. 

7. Conclusions 

A predicate generation technique for learning-based loop invariant inference was presented. 
The technique applies the interpolation theorem to synthesize atomic predicates implicitly 
implied by program texts. To compare the efficiency of the new technique, examples ex- 
cerpted from Linux, SPEC2000, and Tar source codes were reported. The learning-based 
loop invariant inference algorithm is more effective and performs much better in these real- 
istic examples. 

More experiments are always needed. Especially, we would like to have more realistic 
examples which require implicit predicates unavailable in program texts. Additionally, 
loops manipulating arrays often require quantified loop invariants with linear inequalities. 
Extension to quantified loop invariants is also important. 
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